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Abstract 

Background: Gene regulation underlies fungal physiology and therefore is a major factor in fungal biodiversity. 
Analysis of genome sequences has revealed a large number of putative transcription factors in most fungal 
genomes. The presence of fungal orthologs for individual regulators has been analysed and appears to be highly 
variable with some regulators widely conserved and others showing narrow distribution. Although genome-scale 
transcription factor surveys have been performed before, no global study into the prevalence of specific regulators 
across the fungal kingdom has been presented. 

Results: In this study we have analysed the number of members for 37 regulator classes in 77 ascomycete and 31 
basidiomycete fungal genomes and revealed significant differences between ascomycetes and basidiomycetes. In 
addition, we determined the presence of 64 regulators characterised in ascomycetes across these 108 genomes. 
This demonstrated that overall the highest presence of orthologs is in the filamentous ascomycetes. A significant 
number of regulators lacked orthologs in the ascomycete yeasts and the basidiomycetes. Conversely, of seven 
basidiomycete regulators included in the study, only one had orthologs in ascomycetes. 

Conclusions: This study demonstrates a significant difference in the regulatory repertoire of ascomycete and 
basidiomycete fungi, at the level of both regulator class and individual regulator. This suggests that the current 
regulatory systems of these fungi have been mainly developed after the two phyla diverged. Most regulators 
detected in both phyla are involved in central functions of fungal physiology and therefore were likely already 
present in the ancestor of the two phyla. 

Keywords: Transcription factor, Ascomycete, Basidiomycete, Gene regulation. Fungal genomes. Evolution, Zinc 
binuclear cluster. Zinc finger, DNA binding domain, Aspergillus 



Background 

Gene regulation is of major importance for physiology of 
all organisms, and has been intensively studied in fungi. 
It ensures that the required genes are switched on and 
act under the circumstances they are needed, and allows 
fungi to respond to changing conditions. Thirty-seven 
classes of regulator proteins have been identified in fungi 
[1], such as C2H2 (PF00096) [2], Zn2Cys6 (PF00172) [3], 
Fungal Specific transcription factor domain (PF04082), 
bZIP (PF00170) [4], Histone-like transcription factors 
(PF00808) [5], HLH (PFOOOlO) [6], HSF (PF00447) [7], 
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Myb DNA-binding (PF00249) [8], TEA (PF01285) [9] 
and GATA (PF00320) [10]. They coordinate many cellular 
processes that control growth, survival or reproduction 
on particular substrates, under certain conditions, or in 
particular environmental niches. Therefore the presence 
or absence of specific regulators is intimately linked to 
fungal biodiversity. 

Analysis of the first available eukaryotic genome indi- 
cated a likely diversity of regulators [11]. For example, 
a number of Zn2Cys6 regulators known in other fungi 
were absent in Saccharomyces cerevisiae [12]. Differences 
in regulatory protein repertoire were found particularly 
for this class of regulators, which was reduced in number 
in Kluyveromyces lactis compared with S, cerevisiae [13], 
and considerably expanded in Aspergillus nidulans [14] 
and Magnaporthe oryzae [15]. Furthermore, a range of 
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functions for fungal Zn2Cys6 regulators lacking yeast 
orthologs have been described [16]. With an exponentially 
growing number of fungal genome sequences covering all 
branches of the fungal tree of life it is now possible to 
explore the regulatory diversity of fungi and trace the evo- 
lutionary origin of particular regulators. For several spe- 
cific regulators their presence in sets of fungal genomes 
has been reported. The pentose catabolic pathway in 
Aspergillus is regulated by two transcriptional activators, 
XlnR and AraR [17]. While XlnR is present in nearly all 
tested filamentous ascomycetes, AraR appears to be re- 
stricted to the order of the Eurotiales that consists of 
Aspergillus, Penicillium and related genera. An even higher 
diversity was observed for regulators of galactose catabol- 
ism. A subset of the ascomycetes contains the regulator 
GalX that appears to be mainly involved in the oxidore- 
ductive pathway in Aspergillus niger and A, nidulans 
[18,19]. In addition. A, nidulans contains a second 
regulator, GalR, that controls genes of the Leloir Pathway 
and for which orthologs were not detected in any of the 
other studied species [18]. The A, nidulans long chain 
fatty acid utilisation regulators FarA and FarB, which 
themselves are related in sequence, each have orthologs 
widely conserved in filamentous fungi and share a single 
common homolog in certain Hemiascomycetes [20]. In 
contrast, the short chain fatty acid utilisation regulator 
ScfA was very poorly conserved, with possible orthologs 
in A, nidulans, Aspergillus fumigatus and Neurospora 
crassa [20]. The A, niger extracellular protease regula- 
tor PrtT was identified only in certain Aspergilli [21]. 

Previous genome-wide studies of transcription factors 
have focussed on a single transcription factor family 
[12,22,23], a single species [24], or the relative represen- 
tation of transcriptional regulator classes in the fungal 
kingdom [1,25]. The rapid growth in availability of fun- 
gal genomes, particularly those of Basidiomycetes, over 
the last few years has now yielded wider representation 
of genome sequence data across the various lineages of 
the fungal kingdom and provides the opportunity for a 
more detailed analysis of prevalence of transcription reg- 
ulators across fungal genomes. In this paper we com- 
pared the distribution of regulator gene classes between 
currently available fungal genomes. We analysed the 
presence or absence of 64 characterised regulators in 
108 fungal genomes to provide a comprehensive evalu- 
ation of fungal diversity with respect to regulatory sys- 
tems. The regulators we have focussed on are all well 
characterised in at least one fungal species and represent 
a range of different physiological functions, including 21 
regulators involved in development and/or morphology, 
19 regulators involved in carbon metabolism, and 13 
regulators involved in nitrogen and amino acid metabol- 
ism. Many of these regulators perform central functions 
in the organisms where they have been initially studied 



and therefore provide a good test set for the analysis of 
their prevalence and evolution in the fungal kingdom. 

Results 

Distribution of regulator classes throughout the 
fungal kingdom 

To determine whether there are major differences in the 
relative number of regulators from different classes in 
the different fungal phyla, a PFAM analysis of the 37 
known fungal transcription regulator-related PFAM do- 
mains [1] was performed on 77 ascomycete and 31 ba- 
sidiomycete genomes (Additional file 1). A total of 36,636 
putative transcription factors were identified (Additional 
file 2). Interesting differences in the relative number of 
regulators from different PFAM classes could be observed 
between the two phyla (Figure 1, Additional file 3). When 
comparing Ascomycota and Basidiomycota the main 
differences are a much larger expansion of the Zn2Cys6 
domain family (PF000172) and the fungal specific transcrip- 
tion factor domain proteins (PF04082) in the Ascomycota, 
while in the Basidiomycota the C2H2 family (PF00096) 
and the CCHC zinc-finger family (PF00098) form a sig- 
nificantly higher percentage of the total number of reg- 
ulators (Figure 1). This indicates that after these phyla 
split different regulatory strategies have developed based 
on different regulator classes. Within the Ascomycota, 
pezizomycetes contained the highest average amount 
(450) of putative regulators compared to saccharomycetes 
(210) and taphrinomycetes (122). Moreover, the Zn2Cys6 
domain family and fungal specific transcription factor 
domain proteins in pezizomycotina were found in higher 
proportions than in the saccharomycotina and taphri- 
nomycotina indicating the major expansion of these 
regulator classes occurred after divergence of the pezizomy- 
cotina from the other lineages. Unlike in the Basidiomycota, 
the lower abundance of these two families in the sac- 
charomycotina and taphrinomycotina is not accom- 
panied with a higher abundance of the C2H2 and 
CCHC families. 

To compare the PFAM distribution of transcription 
factors between different fungal species, we used hier- 
archical clustering. The 36,636 regulators identified in 
the PFAM analysis were clustered, using OrthoMCL 
followed by manual curation, into 2,887 non-redundant 
orthologous groups (Additional file 4). These ortholog 
groups were then used to analyze the distribution of 
putative regulators among the PFAM families in fungi. A 
clear trend of regulator family distribution could be 
detected when species were clustered based on the 
transcription factor abundance pattern of the families 
(Figure 2). Interestingly, within Ascomycota only pezizo- 
mycotina species were clustered as one distinct group, the 
other major subdivisions, saccharomycotina and taphrino- 
mycotina, were clustered within the Basidiomycota as two 
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Figure 1 Relative distribution of regulator PFAM family members in different fungal phyla. A: ascomycetes, B: basidiomycetes, C: pezizomycotina; 
D: saccharomycotina; E: taphrinomycotina. The description of the PFAM families can be found in Additional file 3. The average number of 
transcription factors for each phylum is indicated underneath each pie chart. The number of genomes analyzed in each phylum or subphylum 
is indicated in parentheses. 



separate groups next to agaricomycotina. This indicates 
that after these phyla diverged, different regulator classes 
have been exploited in different lineages. 

Prevalence of specific regulators in fungal genomes 

A pilot study using bidirectional BlastP analysis to iden- 
tify putative orthologs of a subset of chosen regulators 
was performed followed by manual curation to deter- 
mine the parameters for automated analysis of the 
prevalence of regulators in fungal genomes (data not 
shown). This automated analysis was used to test for 
the presence or absence of orthologs of 64 regulators 
(Additional file 5) in the 108 genomes used for the 
PFAM distribution analysis above (Additional file 6, 
Additional file 7). A cut-off designated for identification 
of distant homologs [26] was applied throughout the 
survey in order to decrease the false negative rate caused 
by the highly divergent sequences of regulators. The re- 
sults were then manually curated based on sequence 
alignments and phylogeny to remove false positives. An 



example is presented for AraR, where GalR was identi- 
fied as a false positive (Figure 3). 

Although the lowest number of orthologs was identi- 
fied in the Basidiomycota, orthologs for five regulators 
involved in development and/or morphology (DopA, 
SteA, RlmA, MedA, Con7), the carbon catabolite repres- 
sor CreA, and the general expression activators HapB, 
HapC and HapE, are commonly found in basidiomy- 
cetes. Most of these regulators have general functions 
for fungal physiology, which explains their common 
distribution among fungi. Conversely, orthologs for six 
of the seven regulators from Schizophyllum commune 
(Fts3, Fts4, Homl, Hom2, Gatl, C2H2) were only de- 
tected in basidiomycetes, while the other (WC2) also 
had orthologs in filamentous ascomycetes. Interestingly, 
no basidiomycete orthologs were detected for any of 
the transcriptional activators involved in plant biomass 
utilization (XlnR, AmyR, InuR, AraR, GalR, GalX, RhaR). 
All these regulators are members of the Zn2Cys6 class 
(Additional file 5), which is particularly expanded in 



Todd et al. BMC Genomics 2014, 15:214 
http://www.bionnedcentral.conn/1471 -21 64/1 5/214 



Page 4 of 1 2 



isiiliiiilli 



III. Ill 11 

1 I I 1 = ^ . § I if ^ 1 1 
siilltlfill! Iilfl lii 

II I I I ^ II I 1 I I 3 I II I I E I ^ I 

^ ^ " 1 ¥ ^ •§ 1 - - " " " - =. 



S S5 ^ 




Figure 2 Hierarchical clustering of fungal species by the abundance of regulators in PFAM families. The difference between species in 
abundance of eacli PFAM family is sliown. Values of presence and absence patterns were normalized by z-transformation across PFAM families 
and coloured so that green indicates the value is below the median for that PFAM family, whereas red indicates the value is higher than the 
median. The brighter the green, the lower the abundance across species, whereas the brighter the red, the higher the abundance across species. 
The largest PFAM class for each species is marked by the white dot in the corresponding colour square. 



ascomycetes compared to basidiomycetes (Figure 1). 
Indications for similar regulation systems related to 
plant biomass degradation have been found in transcripto- 
mics studies of basidiomycetes [27-33]. However, the ab- 
sence of orthologs for the ascomycete regulators suggests 
that these regulators have developed after the split from 
the ascomycetes, and the underlying molecular mecha- 
nisms may differ. 

The overall low number of regulators for which an 
ortholog could be found in the basidiomycetes fits with 
the general PFAM distribution (see above) in which clear 
differences were found in the expansion of the different 
PFAM families between ascomycetes and basidiomy- 
cetes. This suggests a smaller regulatory repertoire in 



the ancestral fungus, which has undergone significant 
evolution since the basidiomycetes and ascomycetes 
separated. 

The ascomycete yeast genomes also lack a significant 
number of the regulators, in particular those involved in 
plant biomass degradation and those involved in develop- 
ment. As most yeasts are not able to degrade plant bio- 
mass, nor go through developmental changes, this fits 
well with their physiology. Interestingly, there is a division 
into two groups with respect to the presence of CreA 
orthologs. Saccharomyces lacks this regulator, but instead 
has MIGl, which is the functional homolog of CreA, des- 
pite low sequence similarity. MIGl orthologs were not 
found in any of the other tested fungi (Additional file 7). 
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Figure 3 Example of phylogenetic identification of false orthologs. Neighbor-Joining tree of tine AraR liomologs using Aspergillus nidulons XInR 
as an outgroup. Tine genes marl<ed in yellow were maintained in the comparison. AN10550 was manually removed, as it clearly did not fall into 
the same cluster as the other genes. In fact, this gene is GaIR, which is unique to A nidulans. The gene identifiers can be found in Additional file 7. 



The most diverse profiles can be seen for the filament- 
ous ascomycetes. Regulators that are particularly poorly 
conserved in this group include two involved in develop- 
ment and/or morphology (AbaA, BrlA, mainly limited to 
Aspergilli and Penicillium), seven involved in carbon 
metabolism (AraR, GalX, GalR, ScfA, InuR, AlcR, Acell), 
one involved in nitrogen metabolism (AmdR), the iron 
homeostasis regulator SreA, the unfolded protein re- 
sponse regulator HacA and the aflatoxin biosynthesis 
regulator AflR, but many other differences can be ob- 
served. While the presence of some of these regulators 
appears to be evolutionarily related (present in nearly 
all species of a certain fungal clade) others are more 
dispersed through the ascomycete tree of life, suggesting 
that the regulator was present in their common ancestor 
but has been lost in specific species of different lineages. 

Present in nearly all of the filamentous Ascomycetes 
(with a cut-off of three genomes missing the regulator) 
are eleven regulators involved in development and/or 
morphology (DopA, RosA, SteA, RlmA, MedA, DevR, 
Hsf2, Con7, StuA, WC2, VeA), four involved in carbon 
metabolism (FacB, FarB, Acel, AmdX), seven involved 
in nitrogen metabolism (UaY, LeuB, CpcA, NirA, AreB 
alpha, Nutl, NmrA), the His -Asp phosphorylation sig- 
nalling regulator SrrA, the CCAAT-binding complex 
components HapB, HapC and HapE, the sulphur meta- 
bolic regulator MetR, and the penicillin biosynthesis 
regulator PenR2, providing a core set of transcriptional 
regulators that control most aspects of physiology. 

For some transcription factors, multiple homologs were 
identified in the same species. In those cases where 
manual curation did not allow elimination of the add- 
itional copies, they were retained in the output data set 
(Additional file 3, Additional file 4, Additional file 6 
and Additional file 7). We do not assume that the 
different copies will have the same function, although 
they are likely involved in similar processes. Functional 



analysis of these proteins will be needed to reveal their 
biological role. 

Discussion 

Genomic studies of fungal transcription regulators have 
generally focused on a single transcription factor class in 
a particular species (e.g. [12,22,23]), or on the transcrip- 
tion factor complement within one species (e.g. [15,23]). 
However, analyses of transcription factor families have 
been conducted across a range of fungal genomes ([1,25]). 
One study identified 37 PFAM families of transcription 
factors represented in 62 fungal genomes, and revealed 
the Zn2Cys6 zinc binuclear cluster and the fungal-specific 
transcription factor domain as the two largest fungal tran- 
scription factor classes [1]. Another study focussed on 
identification of transcription factors in 62 fungal genomes 
using the Fungal Transcription Factor Database (FTFD) 
phylogenomics pipeline, and determined the proportion 
of transcription factors amongst total predicted proteins 
[25]. Analysis of transcription factor family distribution 
revealed species-specific differences [25]. The aim of our 
study was to perform an inventory of the presence of reg- 
ulators in the fungal kingdom, employing the expanded 
set of genome sequences that have become available in 
the last five years. Our analysis of the distribution of regu- 
lator classes indicated differential expansion of certain 
regulator types in ascomycetes and basidiomycetes, con- 
sistent with the development of many regulatory systems 
from a more limited ancestral set of regulators after the 
divergence of the two major fungal phyla. In ascomycetes, 
the Zn2Cys6 and fungal- specific domain regulators over- 
whelmingly predominated. This is consistent with previ- 
ous identification of these two regulator classes as the 
most abundant fungal-specific regulators in a smaller set 
of mostly ascomycete fungal genomes [1]. The C2H2 zinc 
finger class comprised a smaller but major regulator class 
in the ascomycetes. Further analysis revealed a greater 
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relative abundance of the Zn2Cys6 and fungal-specific 
domain regulators in the pezizomycotina than in the sac- 
charomycotina and taphrinomycotina. In basidiomycetes, 
the C2H2 and CCHC classes showed a relative expansion 
and, with the Zn2Cys6 and fungal-specific domain regula- 
tors, comprise four similarly abundant major regulator 
classes. The differential expansion of regulator families in 
the fungal phyla and sub-phyla suggests that evolution of 
many regulators occurred after the divergence of these 
groups. The regulator distribution observed in our ana- 
lysis showed some differences compared with the previ- 
ously reported distributions in FTFD [25], most likely 
due to our expanded dataset. However, the abundance 
of Zn2Cys6 and C2H2 domain regulators reported in 
FTFD are compatible with our results. 

As regulators play a major role in fungal physiology, 
their presence or absence may provide options and im- 
pose limitations on the natural habitat of fungal species. 
Analysis of the presence of individual transcription fac- 
tors demonstrated that regulators with a central role in 
fungal physiology are most commonly found throughout 
the fungal tree of life, while regulators with more spe- 
cific roles are less commonly present. This makes sense, 
as the loss of central regulators is likely to cause a sig- 
nificant competitive disadvantage for a species, unless 
transcriptional network functions are maintained by 
transcriptional rewiring. In contrast, the more specific 
regulators and the regulons they control will only be 
essential or advantageous in particular habitats. 

As the characterised query regulators for our tran- 
scription factor presence/absence analysis were mainly 
from ascomycete fungi, it is not surprising that a relatively 
low number of orthologs was found in the genomes from 
basidiomycetes. Regulation of gene expression is poorly 
studied in these fungi compared with ascomycetes, but 
our data suggests that many of the regulatory systems 
have developed after the split of these two phyla. Interest- 
ingly, differences in the presence and absence of regulators 
were also found in closely related species. While it cannot 
be fully excluded that this can be due to gaps in the 
genome sequence or errors in gene annotation of spe- 
cific genomes, this does suggest that changes in the 
regulatory systems have also occurred more recently. 
Examples of this are the GalX/GalR system for regula- 
tion of galactose catabolism [18] and the protease regu- 
lator PrtT [21], which was shown to differ significantly 
between the AspergiUi, and the specific presence of the 
cellulose regulator Acell [34]. 

While the absence of a particular regulator may ac- 
company loss of an entire regulon and therefore an al- 
tered metabolic or developmental capability, its absence 
could indicate transcriptional rewiring of the regulatory 
mechanism. Conversely, presence of a regulator ortholog 
also does not necessarily indicate conserved function. 



Recent studies have shown that transcription regulatory 
mechanisms can display considerable plasticity across 
species. For some regulons the regulator components 
are conserved but exhibit functional reassignment and 
rewired circuitry, resulting in rearrangements of tran- 
scriptional networks [35]. Other regulons share a con- 
served overall strategy but include additional regulator 
components to integrate additional regulatory signals, or 
show transfer of regulation from one regulator to an- 
other, or rewiring via evolution of combinatorial interac- 
tions between transcription factors [36-38]. The array of 
transcriptional rewiring possibilities indicates that while 
the absence of a particular transcription factor ortholog 
suggests regulatory differences or the loss of regulons, 
the presence of orthologs may, but does not necessarily 
indicate conserved function. Therefore functional ana- 
lysis is required to determine the role of each transcrip- 
tion factor in each species. 

Conclusion 

We have conducted an inventory of the thirty-seven 
PFAM transcription factor classes across 108 genomes 
of the two major fungal phyla and shown differential ex- 
pansion of transcription regulator classes between the 
ascomycetes and the basidiomycetes, with the largest 
expansion of Zn2Cys6 and fungal-specific domain regu- 
lators in the pezizomycotina. We also analyzed the pres- 
ence profiles for 64 known regulators in these 108 
genomes and found that regulators with central func- 
tions in fungal physiology were more commonly present 
than those with more specialised roles. The increasing 
number of fungal genome sequences and functional ana- 
lyses will provide better insight in the evolution of regu- 
latory systems and in particular the 1000 fungal genome 
project [39] will add to this as it aims to cover the 
breadth of the fungal kingdom. 

Methods 

Pilot experiment 

A bidirectional BlastP analysis was performed using as 
query the amino acid sequences of 48 selected regula- 
tors. To manually curate the results, alignments of the 
hits for each query regulator were performed using 
MUSCLE [40] and manually corrected in MEGA4 [41]. 
Phylogenetic trees were generated with MEGA4 using 
three algorithims: Maximum Parsimony, neighbor join- 
ing and minimum evolution. The stability of the clades 
was tested with 1000 bootstrap replicates. The results of 
the manual curation were used to define the parameters 
for the automated analysis of a larger set of genomes. 

Large-scale genome study 

108 completed fungal genomes were extracted from the 
JGI fungal program [42], Broad Institute of Harvard and 
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MIT [43], AspGD [44,45] and NCBI genbank [46] (data 
version March 2013). Pfam-A HMM model was down- 
loaded from the Pfam database [47]. Regulator- related 
domains were identified in each fungal genome in 
HMMerv. 3.0 [48] using the trust cutoff. Genome scale 
protein ortholog clusters were detected according to 
[49], using inflation factor 1, E-value cutoff lE-3, per- 
centage match cutoff 60% as for identification of distant 
homologs [26]. The all-vs-all BlastP search required by 
OrthoMCL was carried out in a grid of 500 computers 
by parallel fashion. The orthologs clusters were then 
curated manually by expert knowledge and literature 
search. Manual curation was aided by aligning the amino 
acid sequences of the hits for each query together with a 
suitable outgroup by MAFFT [50,51], after which neigh- 
bor joining trees were generated using MEGA5 with 
1000 bootstraps. Genes that were clearly separated from 
the query branch in the trees were removed from the 
results. An example of this is given for AraR in Figure 3. 
Putative regulators containing more than one PFAM domain 
were assigned to the cluster based on the number of copies 
of domains found and/or the length of aligned area to the 
domain. PFAM families in 108 genomes were clustered by 
mismatch distance using Genesis [52]. The dendrogram was 
drawn by the complete linkage method using Genesis. A 
z-transformation of data was performed across families in 
order to generate the color scheme for visualization. 

Availability of supporting data 

The data sets supporting the results of this article are 
included within the article and its additional files, or are 
available in the Dryad Digital Repository [53]. 
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